{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Important classes of Spark SQL and DataFrames:\n",
    "\n",
    "    - :class:`pyspark.sql.SQLContext`\n",
    "      Main entry point for :class:`DataFrame` and SQL functionality.\n",
    "    - :class:`pyspark.sql.DataFrame`\n",
    "      A distributed collection of data grouped into named columns.\n",
    "    - :class:`pyspark.sql.Column`\n",
    "      A column expression in a :class:`DataFrame`.\n",
    "    - :class:`pyspark.sql.Row`\n",
    "      A row of data in a :class:`DataFrame`.\n",
    "    - :class:`pyspark.sql.HiveContext`\n",
    "      Main entry point for accessing data stored in Apache Hive.\n",
    "    - :class:`pyspark.sql.GroupedData`\n",
    "      Aggregation methods, returned by :func:`DataFrame.groupBy`.\n",
    "    - :class:`pyspark.sql.DataFrameNaFunctions`\n",
    "      Methods for handling missing data (null values).\n",
    "    - :class:`pyspark.sql.DataFrameStatFunctions`\n",
    "      Methods for statistics functionality.\n",
    "    - :class:`pyspark.sql.functions`\n",
    "      List of built-in functions available for :class:`DataFrame`.\n",
    "    - :class:`pyspark.sql.types`\n",
    "      List of data types available.\n",
    "    - :class:`pyspark.sql.Window`\n",
    "      For working with window functions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from pyspark import SparkContext\n",
    "#sc.stop()\n",
    "sc = SparkContext(master=\"local[3]\") \n",
    "\n",
    "from pyspark import SparkContext\n",
    "from pyspark.sql import *\n",
    "sqlContext = SQLContext(sc)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## DataframeStatFunctions\n",
    "\n",
    "Methods for statistics functionality. [documented here](http://takwatanabe.me/pyspark/generated/generated/pyspark.sql.DataFrameStatFunctions.html)\n",
    "\n",
    "* **approxQuantile(col, probabilities, relativeError)**\tCalculates the approximate quantiles of a numerical column of a DataFrame.\n",
    "* **corr(col1, col2[, method])**\tCalculates the correlation of two columns of a DataFrame as a double value.\n",
    "* **cov(col1, col2)**\tCalculate the sample covariance for the given columns, specified by their names, as a double value.\n",
    "* **crosstab(col1, col2)**\tComputes a pair-wise frequency table of the given columns.\n",
    "* **freqItems(cols[, support])**\tFinding frequent items for columns, possibly with false positives.\n",
    "* **sampleBy(col, fractions[, seed])**\tReturns a stratified sample without replacement based on the fraction given on each stratum."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "DataFrameStatFunctions.corr?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python [conda root]",
   "language": "python",
   "name": "conda-root-py"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.13"
  },
  "toc": {
   "colors": {
    "hover_highlight": "#DAA520",
    "running_highlight": "#FF0000",
    "selected_highlight": "#FFD700"
   },
   "moveMenuLeft": true,
   "nav_menu": {
    "height": "64px",
    "width": "252px"
   },
   "navigate_menu": true,
   "number_sections": true,
   "sideBar": true,
   "threshold": 4,
   "toc_cell": false,
   "toc_section_display": "block",
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
